{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "view-in-github"
      },
      "source": [
        "\u003ca href=\"https://colab.research.google.com/github/tensorflow/tpu/blob/0e3cfbdfbcf321681c1ac1c387baf7a1a41d8d21/tools/colab/classification_iris_data_with_tpuestimator.ipynb\" target=\"_parent\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/\u003e\u003c/a\u003e"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "N6ZDpd9XzFeN"
      },
      "source": [
        "##### Copyright 2018 The TensorFlow Hub Authors.\n",
        "\n",
        "Licensed under the Apache License, Version 2.0 (the \"License\");"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "cellView": "form",
        "colab": {},
        "colab_type": "code",
        "id": "KUu4vOt5zI9d"
      },
      "outputs": [],
      "source": [
        "# Copyright 2018 The TensorFlow Hub Authors. All Rights Reserved.\n",
        "#\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "#     http://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License.\n",
        "# =============================================================================="
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "a0TLC65g9W5n"
      },
      "source": [
        "# A Simple Classification Model using TPUEstimator on Colab TPU\n",
        "\n",
        "## Overview\n",
        "\n",
        "This notebook shows how to use TPUEstimator to build a simple classification model. The model can train, evaluate, and generate predictions using Cloud TPUs. It uses the iris dataset to predict the species of the flower and also shows how to use your own data instead of using pre-loaded data. This model uses 4 input features ***(SepalLength, SepalWidth, PetalLength, PetalWidth)*** to determine one of these flower species ***(Setosa, Versicolor, Virginica)***.\n",
        "\n",
        "The model trains for 20 epochs and completes in approximately 2 minutes.\n",
        "\n",
        "This notebook is hosted on GitHub. To view it in its original repository, after opening the notebook, select **File \u003e View on GitHub**.\n",
        "### Note: You will need a GCP account and a GCS bucket for this notebook to run!"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "IVJtRN9AWsCy"
      },
      "source": [
        "## Instructions\n",
        "\u003ch3\u003e\u003ca href=\"https://cloud.google.com/tpu/\"\u003e\u003cimg valign=\"middle\" src=\"https://raw.githubusercontent.com/GoogleCloudPlatform/tensorflow-without-a-phd/master/tensorflow-rl-pong/images/tpu-hexagon.png\" width=\"50\"\u003e\u003c/a\u003e  \u0026nbsp;\u0026nbsp;Train on TPU\u003c/h3\u003e\n",
        "\n",
        "   1. Create a Cloud Storage bucket for your TensorBoard logs at http://console.cloud.google.com/storage and fill in the bucket name in the \"Resolve TPU Address and authenticate GCS bucket\" cell below.\n",
        " \n",
        "   1. On the main menu, click Runtime and select **Change runtime type**. Set \"TPU\" as the hardware accelerator.\n",
        "   1. Click Runtime again and select **Runtime \u003e Run All** (Watch out: the \"Resolve TPU address and authenticate GCS bucket\" cell requires a bucket name). You can also run the cells manually with Shift-ENTER.\n",
        "\n",
        "TPUs are located in Google Cloud, for optimal performance, they read data directly from Google Cloud Storage (GCS)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "dgAHfQtuhddd"
      },
      "source": [
        "## Learning objectives\n",
        "\n",
        "In this Colab, you will learn how to:\n",
        "*   Define and build a TPUEstimator model with 2 hidden layers and 10 nodes in each layer.\n",
        "*   Run the TPUEstimator model to train, evaluate, and and generate predictions on Cloud TPU.\n",
        "\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "Lvo0t7XVIkWZ"
      },
      "source": [
        "## Data, model, and training"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ACiNzAp8AoWR"
      },
      "source": [
        "### Imports"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "IgQge6h7AqDw"
      },
      "outputs": [],
      "source": [
        "from __future__ import absolute_import\n",
        "from __future__ import division\n",
        "from __future__ import print_function\n",
        "\n",
        "import json\n",
        "import os\n",
        "import pandas as pd\n",
        "import pprint\n",
        "import tensorflow as tf\n",
        "import time"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "7FCdy1aBAsXe"
      },
      "source": [
        "### Resolve TPU Address and authenticate GCS bucket"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "phzyD8iCAzcp"
      },
      "outputs": [],
      "source": [
        "use_tpu = True #@param {type:\"boolean\"}\n",
        "bucket = '' #@param {type:\"string\"}\n",
        "\n",
        "assert bucket, 'Must specify an existing GCS bucket name'\n",
        "print('Using bucket: {}'.format(bucket))\n",
        "\n",
        "if use_tpu:\n",
        "    assert 'COLAB_TPU_ADDR' in os.environ, 'Missing TPU; did you request a TPU in Notebook Settings?'\n",
        "\n",
        "MODEL_DIR = 'gs://{}/{}'.format(bucket, time.strftime('tpuestimator-dnn/%Y-%m-%d-%H-%M-%S'))\n",
        "print('Using model dir: {}'.format(MODEL_DIR))\n",
        "\n",
        "from google.colab import auth\n",
        "auth.authenticate_user()\n",
        "\n",
        "if 'COLAB_TPU_ADDR' in os.environ:\n",
        "  TF_MASTER = 'grpc://{}'.format(os.environ['COLAB_TPU_ADDR'])\n",
        "\n",
        "  # Upload credentials to TPU.\n",
        "  with tf.Session(TF_MASTER) as sess:\n",
        "    with open('/content/adc.json', 'r') as f:\n",
        "      auth_info = json.load(f)\n",
        "    tf.contrib.cloud.configure_gcs(sess, credentials=auth_info)\n",
        "  # Now credentials are set for all future sessions on this TPU.\n",
        "else:\n",
        "  TF_MASTER=''\n",
        "\n",
        "with tf.Session(TF_MASTER) as session:\n",
        "  print ('List of devices:')\n",
        "  pprint.pprint(session.list_devices())"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "r0AEjD1vKGXO"
      },
      "source": [
        "### FLAGS used as model params"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "0tcdTWw1KLiF"
      },
      "outputs": [],
      "source": [
        "# Model specific parameters\n",
        "\n",
        "# TPU address\n",
        "tpu_address = TF_MASTER\n",
        "\n",
        "# Estimators model_dir\n",
        "model_dir = MODEL_DIR\n",
        "\n",
        "# This is the global batch size, not the per-shard batch.\n",
        "batch_size = 128\n",
        "\n",
        "# Total number of training steps.\n",
        "train_steps = 1000\n",
        "\n",
        "# Total number of evaluation steps. If '0', evaluation after training is skipped\n",
        "eval_steps = 4\n",
        "\n",
        "# Number of iterations per TPU training loop\n",
        "iterations = 500"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "I3Ckb7SEKSGO"
      },
      "source": [
        "### Get input data and define input functions"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "Y1qZ3cWyKb-b"
      },
      "outputs": [],
      "source": [
        "TRAIN_URL = \"http://download.tensorflow.org/data/iris_training.csv\"\n",
        "TEST_URL = \"http://download.tensorflow.org/data/iris_test.csv\"\n",
        "\n",
        "CSV_COLUMN_NAMES = ['SepalLength', 'SepalWidth',\n",
        "                    'PetalLength', 'PetalWidth', 'Species']\n",
        "SPECIES = ['Setosa', 'Versicolor', 'Virginica']\n",
        "\n",
        "PREDICTION_INPUT_DATA = {\n",
        "    'SepalLength': [6.9, 5.1, 5.9],\n",
        "    'SepalWidth': [3.1, 3.3, 3.0],\n",
        "    'PetalLength': [5.4, 1.7, 4.2],\n",
        "    'PetalWidth': [2.1, 0.5, 1.5],\n",
        "}\n",
        "\n",
        "PREDICTION_OUTPUT_DATA = ['Virginica', 'Setosa', 'Versicolor']\n",
        "\n",
        "def maybe_download():\n",
        "    train_path = tf.keras.utils.get_file(TRAIN_URL.split('/')[-1], TRAIN_URL)\n",
        "    test_path = tf.keras.utils.get_file(TEST_URL.split('/')[-1], TEST_URL)\n",
        "\n",
        "    return train_path, test_path\n",
        "\n",
        "def load_data(y_name='Species'):\n",
        "    \"\"\"Returns the iris dataset as (train_x, train_y), (test_x, test_y).\"\"\"\n",
        "    train_path, test_path = maybe_download()\n",
        "\n",
        "    train = pd.read_csv(train_path, names=CSV_COLUMN_NAMES, header=0, dtype={'SepalLength': pd.np.float32,\n",
        "        'SepalWidth': pd.np.float32, 'PetalLength': pd.np.float32, 'PetalWidth': pd.np.float32, 'Species': pd.np.int32})\n",
        "    train_x, train_y = train, train.pop(y_name)\n",
        "\n",
        "    test = pd.read_csv(test_path, names=CSV_COLUMN_NAMES, header=0, dtype={'SepalLength': pd.np.float32,\n",
        "        'SepalWidth': pd.np.float32, 'PetalLength': pd.np.float32, 'PetalWidth': pd.np.float32, 'Species': pd.np.int32})\n",
        "    test_x, test_y = test, test.pop(y_name)\n",
        "\n",
        "    return (train_x, train_y), (test_x, test_y)\n",
        "\n",
        "\n",
        "def train_input_fn(features, labels, batch_size):\n",
        "    \"\"\"An input function for training\"\"\"\n",
        "\n",
        "    # Convert the inputs to a Dataset.\n",
        "    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))\n",
        "\n",
        "    # Shuffle, repeat, and batch the examples.\n",
        "    dataset = dataset.shuffle(1000).repeat()\n",
        "\n",
        "    dataset = dataset.apply(\n",
        "            tf.contrib.data.batch_and_drop_remainder(batch_size))\n",
        "\n",
        "    # Return the dataset.\n",
        "    return dataset\n",
        "\n",
        "\n",
        "def eval_input_fn(features, labels, batch_size):\n",
        "    \"\"\"An input function for evaluation\"\"\"\n",
        "    features=dict(features)\n",
        "    inputs = (features, labels)\n",
        "\n",
        "    # Convert the inputs to a Dataset.\n",
        "    dataset = tf.data.Dataset.from_tensor_slices(inputs)\n",
        "    dataset = dataset.shuffle(1000).repeat()\n",
        "\n",
        "    dataset = dataset.apply(\n",
        "            tf.contrib.data.batch_and_drop_remainder(batch_size))\n",
        "\n",
        "    # Return the dataset.\n",
        "    return dataset\n",
        "\n",
        "\n",
        "def predict_input_fn(features, batch_size):\n",
        "    \"\"\"An input function for prediction\"\"\"\n",
        "\n",
        "    dataset = tf.data.Dataset.from_tensor_slices(features)\n",
        "    dataset = dataset.batch(batch_size)\n",
        "    return dataset"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "ktD3gwqHKijt"
      },
      "source": [
        "### Model and metric function"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "mf-xZIlJKn9j"
      },
      "outputs": [],
      "source": [
        "def metric_fn(labels, logits):\n",
        "    \"\"\"Function to return metrics for evaluation\"\"\"\n",
        "\n",
        "    predicted_classes = tf.argmax(logits, 1)\n",
        "    accuracy = tf.metrics.accuracy(labels=labels,\n",
        "                                   predictions=predicted_classes,\n",
        "                                   name='acc_op')\n",
        "    return {'accuracy': accuracy}\n",
        "\n",
        "\n",
        "def my_model(features, labels, mode, params):\n",
        "    \"\"\"DNN with three hidden layers, and dropout of 0.1 probability.\"\"\"\n",
        "\n",
        "    # Create three fully connected layers each layer having a dropout\n",
        "    # probability of 0.1.\n",
        "    net = tf.feature_column.input_layer(features, params['feature_columns'])\n",
        "    for units in params['hidden_units']:\n",
        "        net = tf.layers.dense(net, units=units, activation=tf.nn.relu)\n",
        "\n",
        "    # Compute logits (1 per class).\n",
        "    logits = tf.layers.dense(net, params['n_classes'], activation=None)\n",
        "\n",
        "    # Compute predictions.\n",
        "    predicted_classes = tf.argmax(logits, 1)\n",
        "    if mode == tf.estimator.ModeKeys.PREDICT:\n",
        "        predictions = {\n",
        "            'class_ids': predicted_classes[:, tf.newaxis],\n",
        "            'probabilities': tf.nn.softmax(logits),\n",
        "            'logits': logits,\n",
        "        }\n",
        "        return tf.contrib.tpu.TPUEstimatorSpec(mode, predictions=predictions)\n",
        "\n",
        "    # Compute loss.\n",
        "    loss = tf.losses.sparse_softmax_cross_entropy(labels=labels,\n",
        "                                                  logits=logits)\n",
        "\n",
        "    if mode == tf.estimator.ModeKeys.EVAL:\n",
        "        return tf.contrib.tpu.TPUEstimatorSpec(\n",
        "            mode=mode, loss=loss, eval_metrics=(metric_fn, [labels, logits]))\n",
        "\n",
        "    # Create training op.\n",
        "    if mode == tf.estimator.ModeKeys.TRAIN:\n",
        "        optimizer = tf.train.AdagradOptimizer(learning_rate=0.1)\n",
        "        if use_tpu:\n",
        "            optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)\n",
        "        train_op = optimizer.minimize(loss, global_step=tf.train.get_global_step())\n",
        "        return tf.contrib.tpu.TPUEstimatorSpec(mode, loss=loss, train_op=train_op)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "-HuwjdowKqbb"
      },
      "source": [
        "### Main function"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "gYByHnboKuQc"
      },
      "outputs": [],
      "source": [
        "def main():\n",
        "    # Fetch the data\n",
        "    (train_x, train_y), (test_x, test_y) = load_data()\n",
        "\n",
        "    # Feature columns describe how to use the input.\n",
        "    my_feature_columns = []\n",
        "    for key in train_x.keys():\n",
        "        my_feature_columns.append(tf.feature_column.numeric_column(key=key))\n",
        "\n",
        "    # Resolve TPU cluster and runconfig for this.\n",
        "    tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(\n",
        "            tpu_address)\n",
        "\n",
        "    run_config = tf.contrib.tpu.RunConfig(\n",
        "            model_dir=model_dir,\n",
        "            cluster=tpu_cluster_resolver,\n",
        "            session_config=tf.ConfigProto(\n",
        "                allow_soft_placement=True, log_device_placement=True),\n",
        "            tpu_config=tf.contrib.tpu.TPUConfig(iterations),\n",
        "            )\n",
        "\n",
        "    # Build 2 hidden layer DNN with 10, 10 units respectively.\n",
        "    classifier = tf.contrib.tpu.TPUEstimator(\n",
        "        model_fn=my_model,\n",
        "        use_tpu=use_tpu,\n",
        "        train_batch_size=batch_size,\n",
        "        eval_batch_size=batch_size,\n",
        "        predict_batch_size=batch_size,\n",
        "        config=run_config,\n",
        "        params={\n",
        "            'feature_columns': my_feature_columns,\n",
        "            # Two hidden layers of 10 nodes each.\n",
        "            'hidden_units': [10, 10],\n",
        "            # The model must choose between 3 classes.\n",
        "            'n_classes': 3,\n",
        "            'use_tpu': use_tpu,\n",
        "        })\n",
        "\n",
        "    # Train the Model.\n",
        "    classifier.train(\n",
        "            input_fn = lambda params: train_input_fn(\n",
        "                train_x, train_y, params[\"batch_size\"]),\n",
        "            max_steps=train_steps)\n",
        "\n",
        "    # Evaluate the model.\n",
        "    eval_result = classifier.evaluate(\n",
        "        input_fn = lambda params: eval_input_fn(\n",
        "            test_x, test_y, params[\"batch_size\"]),\n",
        "        steps=eval_steps)\n",
        "\n",
        "    print('\\nTest set accuracy: {accuracy:0.3f}\\n'.format(**eval_result))\n",
        "\n",
        "    # Generate predictions from the model\n",
        "    predictions = classifier.predict(\n",
        "        input_fn = lambda params: predict_input_fn(\n",
        "            PREDICTION_INPUT_DATA, params[\"batch_size\"]))\n",
        "\n",
        "    for pred_dict, expec in zip(predictions, PREDICTION_OUTPUT_DATA):\n",
        "        template = ('\\nPrediction is \"{}\" ({:.1f}%), expected \"{}\"')\n",
        "\n",
        "        class_id = pred_dict['class_ids'][0]\n",
        "        probability = pred_dict['probabilities'][class_id]\n",
        "\n",
        "        print(template.format(SPECIES[class_id],\n",
        "                              100 * probability, expec))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "HiDECOvpKvsr"
      },
      "source": [
        "## Run the model and view the predictions"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 0,
      "metadata": {
        "colab": {},
        "colab_type": "code",
        "id": "hZWES3ZsKzNc"
      },
      "outputs": [],
      "source": [
        "main()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "2a5cGsSTEBQD"
      },
      "source": [
        "## What's next\n",
        "\n",
        "* Learn how to run the same model using [Keras](https://colab.research.google.com/github/tensorflow/tpu/blob/0e3cfbdfbcf321681c1ac1c387baf7a1a41d8d21/tools/colab/classification_iris_data_with_keras.ipynb) rather than TPUEstimator.\n",
        "* Learn about [Cloud TPUs](https://cloud.google.com/tpu/docs) that Google designed and optimized specifically to speed up and scale up ML workloads for training and inference and to enable ML engineers and researchers to iterate more quickly.\n",
        "* Explore the range of [Cloud TPU tutorials and Colabs](https://cloud.google.com/tpu/docs/tutorials) to find other examples that can be used when implementing your ML project.\n",
        "\n",
        "On Google Cloud Platform, in addition to GPUs and TPUs available on pre-configured [deep learning VMs](https://cloud.google.com/deep-learning-vm/),  you will find [AutoML](https://cloud.google.com/automl/)*(beta)* for training custom models without writing code and [Cloud ML Engine](https://cloud.google.com/ml-engine/docs/) which will allows you to run parallel trainings and hyperparameter tuning of your custom models on powerful distributed hardware.\n"
      ]
    }
  ],
  "metadata": {
    "accelerator": "TPU",
    "colab": {
      "collapsed_sections": [
        "N6ZDpd9XzFeN"
      ],
      "name": "Simple Classification Model using TPUEstimator on Colab TPU.ipynb",
      "provenance": [],
      "version": "0.3.2"
    },
    "kernelspec": {
      "display_name": "Python 2",
      "name": "python2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
